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Abstract 

This paper presents the development of an 0(log 2 N) parallel algorithm for the manipulator inertia matrix. 
It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive 
doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal 
elements of the matrix. It results in 0(log 2 N) levels of computation. Computation of the off-diagonal elements 
involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position 
and orientation transforms for the manipulator, is developed. The O(log 2 N) algorithm is presented in both 
equation and graphic forms which clearly show the parallelism inherent in the algorithm. 

1. Introduction 

A major problem in effectively realizing advanced control schemes for robotic systems has been the dif- 
ficulty of implementing the kinematic and dynamic equations required for coordination, in real time. This 
problem is accentuated by the increasing structural and task complexities of the next generation of robots 
under development for space applications. Coordination of multiple-chain systems, with compliant structures 
operating in higher speeds regimes while making and breaking contact with the environment, places stringent 
computational demands on the control system. 

One approach that has been used in advanced dynamic control schemes to obtain better performance has 
been to employ the inertia matrix to decouple the dynamics along the several axes of a robot manipulator. 
This allows either linear or nonlinear control schemes to be more effectively applied [1,2,3]. Specific tasks in 
which the inertia matrix has been applied in the control include surface tracking and object identification using 
force control [4] and computation of collision effects [5]. 

Determination of the inertia matrix involves a considerable amount of computation (approximately equal 
to that of Inverse Dynamics for a 6 degree-of-freedom manipulator). The most efficient serial algorithm for 
computing the inertia matrix was first developed by Walker and Orin [6] and requires 0(N 2 ) time on a single 
processor system. Systolic architectures have been proposed in [7] which reduce the order of the computation to 
O(N) using N processors. The composite rigid body method developed in [6] was used in [7], Essentially, sets 
of links at the end of the manipulator are considered to be fixed with respect to each other so that elementary 
physics principles may be used to compute their composite mass, center of mass, and moment of inertia. Use 
of the Newton-Euler dynamic equations on the reduced system results in efficient computation of the inertia 
matrix components. 

This paper presents the development of a parallel algorithm to compute the manipulator inertia matrix in 
0(log 2 N) time. Recursive doubling [8], which may be applied to linear recurrence equations to reduce the 
order of the computation, is used to compute the diagonal elements of the inertia matrix. Computation of 
the off-diagonal elements involves solution of N sets of linear recurrence equations of size N , N - 1, etc. for 
which recursive doubling is not easily applied. Calculation of position and orientation transforms across the 
links of a varying-size composite rigid body at the base end of the manipulator is required. A new method 
is developed to compute the off-diagonal elements in 0(log 2 N) time, and it avoids redundant computation of 
the position and orientation transforms. 

Set notation is used to develop the equations for the algorithm in a form which explicitly shows the 
parallelism available. The notation used was first developed in part in [9] where recursive doubling was used 
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to compute the Jacobian. 

Previously, recursive doubling was applied to solve Inverse Dynamics in 0(log 2 N) time [10]. Prior to this, 
Lathrop [11] had achieved similar results through a logarithmic recursion method which was derived through 
a restructuring of the fundamental computational framework for the equations. Parallel computation of the 
inertia matrix has also been considered in the context of computing robot forward dynamics [12]. Recursive 
doubling was used to compute the diagonal elements and a modified row-sweeping algorithm was used to 
compute the off-diagonal elements with a resulting algorithm which was of O(N). The work of this paper 
differs from [12] in that here quantities are not transformed to base coordinates before applying recursive 
doubling. Also, the new method for computing the off-diagonal elements results in a parallel algorithm which 
is of 0( log 2 N). More recently, Fijany and Bejczy [13] have developed two parallel algorithms which achieve 
the lower bound in the computation time of 0(log 2 N) +0(1). However, when they mapped the algorithms 
onto arrays of processors, they concluded that an O(N) parallel/pipeline algorithm will have a much improved 
efficiency with only a slight reduction in speedup. 

In the next section, a brief overview of the O(N) parallel algorithm is given. In the section following, 
the O(log 2 N) parallel algorithm is developed. The entire parallel algorithm is then summarized in a table 
in a form which shows much of the parallelism inherent in the algorithm. Finally, the work is summarized, 
conclusions are made, and several areas in which the work may be extended are discussed. 

2. O(N) Parallel Algorithm for the Inertia Matrix 

An O(N) parallel algorithm, based upon the determination of the mass, center of mass, and moment of 
inertia of a series of composite rigid bodies for an TV-degree-of-freedom open-chain manipulator, has been 
previously derived to compute the inertia matrix, H(q) [7]. It was based on the earlier work of Walker and 
Orin [6] in which an efficient 0(N 2 ) algorithm was developed for the inertia matrix to further realize efficient 
dynamic simulation on a single processor. In addition to the O(N) algorithm, various systolic architectures 
were also proposed in [7] to achieve a real-time response. A complete listing of the algorithm is shown in 
Table 1. 

Briefly, Inverse Dynamics is applied to the manipulator N times. Starting with joint N and working 
toward joint 1 , a unit acceleration is applied to a joint with all joint velocities and other joint accelerations 
equal to zero. This simply divides the manipulator into two sets of composite rigid bodies with one degree 
of freedom between them. The mass (Mi), center of mass (c t ), and moment of inertia (E,) for the composite 
rigid body at the end of the manipulator (links t through N ) are first computed recursively using basic physics 
principles. Then for each composite rigid body, the forces and moments at joint i due to a unit acceleration 
there (f^n^i) may be simply computed by applying the Newton-Euler equations of motion to the composite 
body. The component of the force along (prismatic) or moment about (revolute) the joint axis (t) is the 
diagonal component of the inertia matrix, H{ t i. The required force or moment needed to ensure zero velocity 
and acceleration at joint j (for j < t) is simply the off-diagonal element of the inertia matrix, Hjj. These 
forces and moments (fy ( j, n Jt ,*) are recursively computed for the various joints of the lower composite rigid body 
by simple resolution of the force and moment at joint i to the required points. Only the diagonal and upper 
off-diagonal elements of the inertia matrix are computed since the inertia matrix is symmetric. 

Noting Table 1, the computation of the composite rigid body parameters and the diagonal elements of 
the inertia matrix is seen to require O(N) time. The computation of the off-diagonal elements involves N 
recursions each of which may be computed in parallel, also giving O(N) time. 

3. Development of an 0( log 2 N) Parallel Algorithm for the Inertia Matrix 

The concept of recursive doubling [8] may be used to develop an 0( log 2 N) parallel algorithm to compute 
the composite rigid body parameters and diagonal elements of the inertia matrix. It has been previously applied 
to achieve 0( log 2 N) parallel algorithms for Inverse Dynamics [10,11]. In order to understand the basic concept 
and develop the notation used, computation of the composite rigid body masses for a manipulator of eight 
degrees of freedom will first be considered in this section. A parallel algorithm of 0(log 2 N) will be developed 
for computing these masses, A/, . Then, the approach will be extended to computing all of the composite rigid 
body parameters, resulting in an 0( log 2 N) parallel algorithm for computing the diagonal elements of the 
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Table 1: 0(N) Parallel Algorithm for Computing the Inertia Matrix. 


CONST 


Mn+i 

= 0 

Ctf+l 

= 0 

E/*+i 

= 0 

zo 

= [0 0 l] r 

Vi 

f 1 revolute joint 
“ \ 0 prismatic joint 

BEGIN 



{* Computation of composite rigid body parameters and diagonal elements of the inertia matrix *} 

FOR i := N TO 1 DO 
BEGIN 1 

Afj := A/,+1 + nij 

Ci := {■jj- [mi (s') + Afi+i Ci+i + ’p*+i)]| 

Ei := *17i+iE,+i i+1 Ui + Af j+ i [ (‘Ui+i Ci + i + i p* +1 - Ci) • (‘tfi+i c i+ , + V+i - Ci) 1 

— Ci+i + 'pf+1 — Ci) Ci+i + 'Pi+i — Ci) j 

+Ii + m i [(s* — Ci) • (Si — Ci) 1 — (Si — Ci) (Si — Ci) ] 

Fi := <n (Zo x Mi Ci) + 9i (Mi z 0 ) 

N, * = (Ei Zo) 
fi,i := Fi 
ni,i := Ni + CixFi 
Hi t i != V% (lli,* • Zo) 4* (f»,i * Zo) 

END1 

{* Computation of off-diagonal elements of the inertia matrix *} 

FOR ALL i := N TO 1 DO 
FOR i := i-1 TO 1 DO 
BEGIN2 

fji := f,+i,, 

n := *Uj + 1 (n J+ i,i 4- J P;+i x f>+i,«) 

£T>,i := Vj ( n i,» * zo) 4- (f ,,i • zo) 

END2 

END 

inertia matrix. 

Computation of the off-diagonal elements of the inertia matrix involves N independent recursions of varying 
size for which recursive doubling is not easily applied. The last part of this section gives an O(log 2 N) algorithm 
to compute these elements. It effectively uses the position and orientation transforms for the full N-Yink 
manipulator, which are computed while obtaining the diagonal elements, to transform forces and moments 
over a reduced, varying-size composite rigid body at the base end. 
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Figure 1: Flow of Data and Computation for Determining M% 


3.1 Parallel Algorithm for M, 

As shown in Table 1 , a recursive algorithm for computing M,- is given by 


Mjv+i = 0 


(i) 

Mi = M,'+i -f m, 

for i = N, • * • , 1. 

(2) 


Eq. (2) is an example of a linear recurrence relationship for which recursive doubling [8] may be applied. 
Consider computation of Mi , the composite rigid body mass for the entire manipulator. The mass across sets 
of two links may first be computed, all in parallel. For eight links, the links are simply combined as follows: 

{1} . {2} , {3} , {4} , {5} , {6} , {7} , {8} — . {1, 2} , {3, 4} , {5,6} , {7,8} (3) 

which indicates that the size of the set, for which the mass has been computed, has doubled. Again, doubling 
the number of links in a set gives 

{1,2}, {3, 4} , {5, 6} , {7, 8} — {1, 2, 3, 4} , {5, 6, 7, 8} , (4) 

indicating that the mass is now computed fpr sets of four links. The doubling effect may be recursively applied 
(recursive doubling): 

{1>2, 3,4},{5,6,7,8} — ► {1,2, 3, 4, 5, 6, 7,8}, (5) 

so that the total mass of all eight links, Mi, is available after 3 steps (log 2 8). 

Fig. 1 shows the flow of the data and computation involved in the three steps for computing Mi (Mi > 8 ). 
In the figure, Mj t k represents the mass of links j through k. This implies the following mapping relationship: 

M,* — ► M^at (6) 

and 

mi — ► M iti (7) 
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to extend the previous notation used. 

Equations may be developed for computing Afi . The total number of levels of computation is given by 

l T = log, N. ( 8 ) 

The total number of sets of links at each level whose composite mass is to be computed in parallel is given by 

uj. — 2 ,T- ' ( 9 ) 

where l is the level number. Also, the variable ti is defined and used in Fig. 1 as the number of a set on a given 
level. 

At the first level (/ = 1), links are combined in groups of two; at the second level (/ = 2), links are combined 
in groups of four, etc. Thus, the number of links in a set at each level, the width w, is a function of / and is 
given by 


Each computational step in Fig. 1 then combines two sets of links into one: 

{» -f 1, • • •, j} , {i + 1, • • •» i} — * {* + 1> * * * » hi + 1* * * * > 

where 

i = w(u — 1 ), 
j — u>(ti — 0.5), and 
k = w ti. 


( 11 ) 


( 12 ) 

(13) 

(14) 


Using the above notation for indexing, the following set of equations formalizes the parallel computation 
of Mi (Mi >8 ). 

1 t := log, N 
FOR / := 1 TO b DO 

{ u T Zlr~> } 

FOR ALL u := 1 TO u T DO 
( i := tu (u — 1) i 

< j := w(u — 0.5) > 

{ k := wu J 

BEGIN 

M t -+i,* := M,+ij + Mj+i,t 
END 

The above equations only determine the following set of composite rigid body masses: 

{Mi >8 , Ms ) 8 , M7 i8 , M 8i8 } = {Mi, Ms, Mt, M 8 } . 


(15) 


(16) 


To obtain the other composite rigid body masses needed, in general, multiple computations must be performed 
on each set of links at each level. All necessary computational steps are shown in Fig. 2. Note that the masses 
for subsets of links, from a link in the first half of a set to the last link in the set (Eq. (11)), must be computed. 
In so doing, another parameter which gives the total number of computations for each set of links on level / 
may be defined: 


_ 9/-1 

Here, v is also defined to be the computation number for set u for a given level /. Note that 
u T * v T = 2't- 1 * 2'- 1 = 2 ,T-1 = y 


(17) 


(18) 
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Figure 2: Flow of Data and Computation for Determining Mi (A/, f jv) 


which is constant over all the levels. 

3.2 Parallel Algorithm for the Diagonal Elements 

The approach taken to compute the composite rigid body masses, Af,*, may be extended to develop the 
0(log 3 N) parallel algorithm for all of the composite rigid body parameters and to further compute the diagonal 
elements of the inertia matrix. The algorithm is summarized in the first part of Table 2 and will be discussed 
in the following paragraphs. 

First consider the computation of the composite rigid body parameters: mass (Af), center of mass (c), 
and moment of inertia (E). At a given level /, the main body of the computation is to calculate these in 
parallel, for any combination of u and v, across the sets of links from * -{- v to k. Note that the components 
of c and E are determined with respect to the coordinate system of the base link of the set (t -f v) so that 
the orientation and position transforms from links i -f v to j + 1, i+ *l7j + 1 and p #+VJ+ i, are required in the 
computations. Note also that transform of inertial quantities associated with the links, back to the base of the 
manipulator, is not required here as has been the case in other work [12]. As in the general case for recursive 
doubling [8], the transforms needed on level / are computed on level / — 1. Also, in the equations, note that 
Pi f «+i = *p* +1 , Mi t i = m f , c i t i = s* , and E t> , = I» which are all initially given for each link i. 

The ceiling function f ] and conditions on the range of the indices j and Jfc have been used so that the 
equations are appropriate for any number of degrees of freedom, N . Note that the computation is required 
only if the number of degrees of freedom N is greater than or equal to the number of the first link (j + 1) of 
the second of the sets of links to be combined (Eq. (11)). 

The composite rigid body parameters as well as the diagonal components of the inertia matrix may be 
computed in 0(log 2 N ) time and this is graphically depicted as Stage A of the computation in Fig. 3. In the 
figure, the numbers in the parentheses within a box give the associated links for which the computation is 
made. A “zeroth” level of computation is also shown in which the position and orientation transforms across 
each individual link are computed. Note that the angle for the first degree of freedom is not needed and that 
% Ui + 1 and *p* +1 are computed for link i. 
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Table 2: 0(log 2 N) Parallel Algorithm for Computing the Inertia Matrix. 


BEGINll 

b ■= N 1 


FOR I := 1 TO l T DO 
BEGIN12 



FOR ALL u := 1 TO u T AND 
FOR ALL v := 1 TO v T WITH 


{ 


i 


= tu(u-l) 

= — 0 . 5 ) 

= tlltl 


IF (* > N) THEN (* := N) 


} 


IF (i 


+ 1 < N) THEN DO 
BEGIN13 


:= » l U k+l 

P;+.,M-i : = Pi+«.j+i + ' +vu }+i Pj+»+i 

Mi+ Vf k := + W>+i,* 




C •+«,> + c >+i p* + P*+«J+i)] 

A*«+*,fc 

E >+1 , fc + E •+*,> 

+ Mj+l,* [(* +V ^>+l c j+l,k + Pi+v.j+l ” c i+t»,fc) • C i+1,* + P*+«,j+l - c i+v,k) 1 

- ( , + ,, £/ J+1 C i+1| * + Pi + *,j+l - Ci+ V| fc) Cj + i ( * + P»+v,j “ C »+*j) | 


+ A/»+*,> [( c »+« J “ C i+V,k) ■ ( C *+«S> ” C *4»,k) 1 

~ ( c *4*,j ” c i + « f fc) ( c »+v,j — C»4«»,k) 1 


END13 

END12 

FOR ALL t := 1 TO N DO 


BEGIN14 

:= <ri (zo x Mi t s + &i (Mi t N Zo) 


:= (E i( * z 0 ) + c itN x f iti 

Hi.i 

!“ (IX « ( * • Zq) + (fj,i * Zo) 

END14 



ENDll 


3.3 Parallel Algorithm for the Off-Diagonal Elements 

To obtain the off-diagonal elements of the inertia matrix, the force and moment at joint i due to a unit 
acceleration there (f tft and n iti ) should be resolved to the previous joints (f ) f i and n^) for j < i. This involves 
a total of N linear recurrences of size N, N - 1, etc. Several approaches to parallel computation of the off- 
diagonal elements are first discussed in this section. Then an efficient approach, which uses the transforms 
computed in Stage A for the diagonal elements and which achieves an 0(log 7 N) time, is detailed. 

First of all, computation of the off-diagonal terms may be computed in 0(N) time if the parallel algorithm 
given in Table 1 is employed. This results in a computation time of K\0(N)+ f^20(log 2 N) for the entire inertia 
matrix where the K\ coefficient is relatively small. This is similar to the modified row-sweeping algorithm 
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Table 2 (continued) 


BEGIN21 

FOR ALL x := 1 TO N DO 
BEGIN22 

** — n°g2 *1 

FOR I := 1 TO l x DO 
BEGIN23 



FOR ALL u := 1 TO u T AND 
FOR ALL v := 1 TO v T WITH 



IF (j + 1 < i < k) THEN DO 
BEGIN24 

fi+v,* := fj+i,* 

:= + f/ji+i (n^+i t , + P»+ l ^ ^jf+i,*) 

Hi+ Vt s := <Ti+ v (nj+v t x * zo) + ^i+v (ft+v,* * Zo) 

END24 

END23 

END22 

END21 


applied by Lee and Chang [12] when computing the inertia matrix for parallel forward dynamics computation. 

The order of the computation may be further reduced if recursive doubling is applied to each of the re- 
currences. Computation of the largest recurrence, which gives the elements of the last column of the inertia 
matrix, can then be achieved in 0(log 2 N) time. Since each of the recurrences may be computed in parallel, the 
overall computation time for the off-diagonal elements is 0(log 2 7V). The major problem with this approach, 
however, is that the position and orientation transforms ( U's and p’s) across the links of a varying-size com- 
posite rigid body at the base end of the manipulator are computed independently. This results in redundant 
computation both within this stage of computation and with Stage A for the diagonal elements. 

The most efficient method for computing the off-diagonal elements is to use the transforms computed in 
Stage A while yet completing the computation in 0(log 2 N) time. But it should be understood that these 
transforms were basically computed for a manipulator of size N only. However if the computational structure 
of Stage A is considered in Stage B for the pff-diagonal elements, then the more efficient algorithm is achieved. 
Noting Fig. 3, computation of the off-diagonal elements of column 6 is shown. Judicious elimination of many 
of the computational blocks (shown with dashed boxes) results in effective use of the transforms available from 
Stage A while still achieving 0( log 2 N) time. 

Note that the computation in the left part of Stage B is generally not needed. This is shown implemented 
in the equations, in the last part of Table 2, through appropriate conditions on the indices j and k. Essentially, 
the computation is not required unless the column number x falls within the range of the second of the sets 
of links to be combined (Eq. (11)). In Fig. 3, the numbers in parentheses within a box give the indices for the 
f ’s and n’s which are calculated. Not explicitly shown in the figure is the flow of the position and orientation 
transforms from Stage A to Stage B which are required for the computation of the off-diagonal terms. 

The 0(Iog 2 N ) parallel algorithm is shown in its entirety in Table 2. When compared with Table 1, it may 
be noted that the total number of primitive operations has increased with the parallel algorithm. In general, 
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Figure 3: Flow of Data and Computation for the Parallel Algorithm ( N = 8). 


the number of primitive operations increases as one attempts to decrease the order of the computation. In 
fact if the total number of operations decreased when going from a serial algorithm to a parallel algorithm, 
then the parallel algorithm should also be used on a serial processor since it becomes the most efficient serial 
algorithm as well. 

4. Summary and Conclusions 

This paper has outlined the development of an 0( log 2 N) parallel algorithm for computing the NxN inertia 
matrix for a robot manipulator. A listing of the algorithm is given in Table 2, and its flow of computation and 
data is shown in Fig. 3. In each case, the parallelism inherent in the algorithm is explicitly shown. 

A recursive doubling technique [8] was used to achieve computational reduction over the O(N) parallel 
algorithm listed in Table 1 in computing the diagonal elements of the inertia matrix. It avoids transformation 
of inertial quantities, associated with each link, to base coordinates. An O(log 2 N) algorithm, which uses 
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the position and orientation transforms computed for the entire manipulator when determining the diagonal 
elements, was then formulated to calculate the upper off-diagonal elements of the inertia matrix. Additional 
computation to determine the transforms over a varying-size composite rigid body (fixed set of links) at the 
base end of the manipulator is also avoided. 

Investigations have also been made in associated work to determine the relationship between the order 
of the computation and the number of processors required for implementation. As expected, as the order of 
computation decreases, the total number of processors required increases. 

Work is also under way to efficiently map the 0(log 2 N ) algorithm into a parallel architecture structure 
while accounting for communications (I/O) overhead. The basic objective is to minimize the computational 
latency while maximizing CPU utilization. Further, work is under way to increase concurrent task processing 
on multiple processors by developing a new computational model which includes effective use of prediction 
algorithms. Hopefully, the work of this paper will provide the foundation for parallel implementations of 
the inertia matrix which will facilitate effective realizations of dynamic control schemes for space telerobotic 
systems. 
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